Web robot detection in scholarly Open Access institutional repositories
نویسنده
چکیده
Purpose – This paper investigates the impact and techniques for mitigating the effects of web robots on usage statistics collected by Open Access institutional repositories (IRs). Design/methodology/approach – A review of the literature provides a comprehensive list of web robot detection techniques. Reviews of system documentation and open source code are carried out along with personal interviews to provide a comparison of the robot detection techniques used in the major IR platforms. An empirical test based on a simple random sample of downloads with 96.20% certainty is undertaken to measure the accuracy of an IR’s web robot detection at a large Irish University. Findings – While web robot detection is not ignored in IRs, there are areas where the two main systems could be improved. The technique tested here is found to have successfully detected 94.18% of web robots visiting the site over a two-year period (recall), with a precision of 98.92%. Due to the high level of robot activity in repositories, correctly labelling more robots has an exponential effect on the accuracy of usage statistics. Limitations – This study is performed on one repository using a single system. Future studies across multiple sites and platforms are needed to determine the accuracy of web robot detection in OA repositories generally. Originality/value – This is the only study to date to have investigated web robot detection in IRs. It puts forward the first empirical benchmarking of accuracy in IR usage statistics.
منابع مشابه
Institutional Repositories : Faculty Deposits , Marketing , and the Reform of Scholarly
This study explores faculty deposits in institutional repositories (IR) within selected disciplines and identifies the diverse navigational paths to IR sites from library Web site homepages. The statistical relationship between the development of an IR and the presence of a Web site dedicated to the reform of traditional scholarly communication is also explored. The implications for the develop...
متن کاملCompleteness and overlap in open access systems: Search engines, aggregate institutional repositories and physics-related open sources
This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System). The 2001-2013 Nobel Laureates in Physics served as the samp...
متن کاملInstitutional Repositories: Evaluating the Reasons for Non-use of Cornell University's Installation of DSpace
Problem: While there has been considerable attention dedicated to the development and implementation of institutional repositories, there has been little done to evaluate them, especially with regards to faculty participation. Purpose: This article reports on a three-part evaluative study of institutional repositories. We describe the contents and participation in Cornell’s DSpace and compare t...
متن کاملThe Open Access Availability of Library and Information Science Literature
To examine the open access availability of Library and Information Science (LIS) research, a study was conducted using Google Scholar to search for articles from 20 top LIS journals. The study examined whether Google Scholar was able to find any links to full text, if open access versions of the articles were available and where these articles were being hosted. The results showed that the arch...
متن کاملPLEIADI, a Portal Solution for Scholarly Literature
The PLEIADI Project (acronym for “Portale per la Letteratura scientifica Elettronica Italiana su Archivi aperti e Depositi Istituzionali”, a portal for Italian scholarly e-literature in open archives and institutional repositories) originated from the collaboration between two major Italian university consortia, CASPUR and CILEA, within the framework of the AEPIC project. PLEIADI aims at buildi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Library Hi Tech
دوره 34 شماره
صفحات -
تاریخ انتشار 2016